Redshift

Querying Overview

Redshift is a SQL database from AWS which is generally compatible with standard Postgres. A discussion on compatibility can be found at https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html

A discussion of the SQL interface to Redshift can be found at https://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html

Querying Context

Database

The Qarbine administrator sets the database of the Qarbine data service when defining the data service. You cannot use a command like USE DATABASE (as in Postgres) to switch databases mid-session. All queries in that session can only access objects (schemas, tables, etc.) within that database. To access another database, you must create a new connection specifying that database.

To see the databases managed by the server you can still run

show databases

Schema

The Qarbine administrator may set a schema value in the data service definition. With it set the schema referenced becomes the first element in the search path. By default, the search path is "$user", public, where "$user" is replaced by the current session user's schema, if it exists.

The Redshift search path is a session-level setting that determines the order in which schemas are searched when you reference a database object (such as a table or function) by an unqualified name (i.e., without specifying the schema). This mechanism allows you to control which schema Redshift looks in first when resolving object names, which is especially useful if you have objects with the same name in different schemas.

To see the effective search path run

show search_path

A sample result is shown below.

In this example the data service has a data option of “schema=tickit”. If a table is not found in any schema in the search path, you must use a qualified name (e.g., schema_name.table_name). For more information see https://docs.aws.amazon.com/redshift/latest/dg/r_search_path.html.

With no schema configured retrieving from the category table requires a schema prefix such as

select *  from tickit.category

With the “tickit” schema configured retrieving from the category table can simply be

select *  from category

Referencing Tables

The table reference sample_data_dev.tickit.sales in Amazon Redshift consists of three elements, each specifying a level of organization within the database:

Element	Description
sample_data_dev	The database name. This is the database you are connected to or referencing.
tickit	The schema within the database. Schemas group related tables and other database objects.
sales	The table name. In this case, it refers to the sales fact table in the TICKIT sample schema.

General Data Handling

In general primitive datatypes such as numeric, strings, and booleans are in the answer set as their expected JavaScript values. A discussion of data types can be found at
https://docs.aws.amazon.com/redshift/latest/dg/c_Supported_data_types.html

Handling JSON Data

Redshift does not have a native JSON or JSONB data type as in PostgreSQL. Instead, JSON and other semi-structured data are stored in the SUPER type, which is designed for this purpose. SUPER columns can store complex, nested JSON documents, arrays, and objects. However, the “JSON” row values returned from Redshift are simple JSON strings and not JSON objects. To convert the strings into genuine JSON objects use

#pragma convertToObjects CSV_list_of_fields

A discussion of Redshift JSON related functions can be found at https://docs.aws.amazon.com/redshift/latest/dg/json-functions.html

Consider this table which uses a SUPER column to store JSON strings.

CREATE TABLE user_profiles (
    user_id INT,
    profile SUPER
);

In this table definition you can store different attributes for different users without altering the table schema. You can insert JSON data using the JSON_PARSE() function to convert a JSON string into the SUPER type:

INSERT INTO user_profiles (user_id, profile) VALUES
  (1, JSON_PARSE('{"name": "Alice", "age": 30, "skills": ["SQL", "Python"]}')),
  (2, JSON_PARSE('{"name": "Bob", "skills": ["R", "SQL"], "location": "London"}')),
  (3, JSON_PARSE('{"name": "Charlie", "hobbies": ["guitar", "skiing"], "age": 28}')),
  (4, JSON_PARSE('{"username": "daisy99", "signup_date": "2025-01-01"}'));

Running this query in Redshift Studio

SELECT user_id, profile
FROM user_profiles;

shows

The result in the Data Source Designer is shown below.

As expected the user_id is a number and the profile a string.

Running the following query extracts specific fields from the profile

SELECT  user_id,  profile.name AS name, profile.age as age
FROM user_profiles;

The result in Redshift Studio is

The result in Qarbine is

Such SQL is fine if the JSON contains fairly similar content. In some cases this is certainly not the case and the author of the SQL may not know all the possibilities or the data may evolve and a previously satisfactory SQL statement will later miss some important new data. Here is a query that knows a lot about the profile JSON content.

SELECT  user_id, profile.name AS name,  profile.age AS age, profile.skills[0] AS primary_skill,
  profile.location AS location, profile.hobbies[0] AS primary_hobby,
  profile.username AS username,  profile.signup_date AS signup_date
FROM user_profiles;

The answer set is shown below.

user_id	name	age	primary_skill	location	primary_hobby	username	signup_date
1	Alice	30	SQL	NULL	NULL	NULL	NULL
2	Bob	NULL	R	London	NULL	NULL	NULL
3	Charlie	28	NULL	NULL	guitar	NULL	NULL
4	NULL	NULL	NULL	NULL	NULL	daisy99	2025-01-01

Vital considerations include how to handle multiple hobbies and skills and the potential explosion of the answer set size if arrays are unnested. A solution is to simply have the answer set contain genuine JSON objects.

Manipulating Row Shape

Qarbine can be directed to further modify the answer set elements This is done using ‘pragmas” which are discussed in the general Data Source Designer documentation. The most common pragmas likely to be used are “”convertToObject CSV_list_of_fields” and “pullFieldsUp CSV_list_of_fields”. For example, this query

SELECT user_id, profile
FROM user_profiles;

returns this answer set.

Here the profile column is a SUPER column containing JSON strings.

Adding a pragma the query specification is now

#pragma convertToObject profile
SELECT  user_id, profile
FROM user_profiles;

The new answer set is shown below.

The profile field values are now JSON objects. The details of the first row are shown below.

The first pragma converts the profile string value into an object and stores the result in the field named “profile”. Template formulas can then reference fields using either

@current.profile.someField

#profile.someField

Multiple pragma can be sequenced together. For example, this query specification

#pragma convertToObject profile
#pragma pullFieldsUp profile
SELECT  user_id, profile
FROM user_profiles;

returns this answer set.

In this example we also move the child fields of profile upward to the main level using the 2nd pragma. The fields that were previously within the user_profile object have been pulled up a level. This makes it more convenient to reference field values in a template formula. Template formulas can then reference fields using either

@current.someField

#someField

Array Handling

In Amazon Redshift, you can store and query arrays using the SUPER data type, which supports storing arrays as part of JSON-like documents.Consider this table definition.

CREATE TABLE user_skills (
    user_id INT,
    skills SUPER
);

It is populated using

INSERT INTO user_skills (user_id, skills) VALUES
  (1, JSON_PARSE('["SQL", "Python", "Redshift"]')),
  (2, JSON_PARSE('["R", "SQL"]')),
  (3, JSON_PARSE('["Java", "Scala", "Spark", "Python"]'));

Running this query

SELECT user_id, skills
FROM user_skills;

returns the answer set

Applying a pragma

#pragma convertToObject skills
SELECT user_id, skills
FROM user_skills;

returns this answer set.

The first row details are

Unnesting Arrays

Here is a query that “unnests” array values.

SELECT user_id, value AS skill
FROM user_skills,
LATERAL FLATTEN(input => skills);

For each element in the array, an answer set row is created. This can lead to rapidly increasing answer set size with a lot of redundant data because of the array element multiplier.

user_id	skill
1	SQL
1	Python
1	Redshift
2	R
2	SQL
3	Java
3	Scala
3	Spark
3	Python

The preferred approach is to not unnest arrays at all and interact with the JSON objects in their natural form. This is Qarbine’s preferred approach.

System Tables

The query

SELECT * FROM information_schema.tables

does not show any tables in Amazon Redshift because Redshift does not fully support the standard information_schema views as in PostgreSQL or other relational databases. Redshift provides its own set of system tables and views for metadata, and the information_schema is either incomplete or not populated in the same way.

For more information see
https://docs.aws.amazon.com/redshift/latest/gsg/t_querying_redshift_system_tables.html
https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html

Metadata Source	Works in Redshift Query Editor	Works in Node.js Client	Recommended?
Information_schema.tables	Yes (sometimes, not reliable)	No	No
PG_TABLE_DEF	Yes	Yes	Yes
SVV_TABLES	Yes	Yes	Yes

Sample Data Sets

Redshift provides several sample data sets. The TICKIT sample data set that Amazon Redshift documentation examples use can be loaded using the information on . Its relationships are shown below.

More information on the can be found at https://docs.aws.amazon.com/redshift/latest/dg/c_sampledb.html

Troubleshooting

If errors occur while using Qarbine then a good course of action is to run the query using your traditional Postgres tools. This can range from psql or PGAdmin for example.

The Query Profiler in the Redshift console lets you analyze query execution plans, identify bottlenecks, and see where time is spent (e.g., on nested loop joins or data distribution steps).

Reference

See this page for more information https://docs.aws.amazon.com/redshift/

Redshift

Querying Overview​

Querying Context​

Database​

Schema​

Referencing Tables​

General Data Handling​

Handling JSON Data​

Manipulating Row Shape​

Array Handling​

Unnesting Arrays​

System Tables​

Sample Data Sets​

Troubleshooting​

Reference​

Querying Overview

Querying Context

Database

Schema

Referencing Tables

General Data Handling

Handling JSON Data

Manipulating Row Shape

Array Handling

Unnesting Arrays

System Tables

Sample Data Sets

Troubleshooting

Reference